Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 5 of that blog series. You can checkout my previous blogs at:
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. MultiplyDoublingBySelectedScalarSaturateHigh
Vector64<short> MultiplyDoublingBySelectedScalarSaturateHigh(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
by the specified vector element at rightIndex
of the right
vector, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the result vector.
private Vector64<short> MultiplyDoublingBySelectedScalarSaturateHighTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingBySelectedScalarSaturateHigh(left, right, 0);
}
// left = <1000, 500, 13, 14>
// right = <500, 22, 23, 24>
// rightIndex = 0
// Result = <15, 7, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyDoublingBySelectedScalarSaturateHigh(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyDoublingBySelectedScalarSaturateHigh(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyDoublingBySelectedScalarSaturateHigh(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<short> MultiplyDoublingBySelectedScalarSaturateHigh(Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyDoublingBySelectedScalarSaturateHigh(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyDoublingBySelectedScalarSaturateHigh(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyDoublingBySelectedScalarSaturateHigh(Vector128<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingBySelectedScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmulh v16.4h, v0.4h, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
2. MultiplyDoublingSaturateHigh
Vector64<short> MultiplyDoublingSaturateHigh(Vector64<short> left, Vector64<short> right)
This method multiplies the values of corresponding elements of the left
and right
vectors, doubles the results, places the most significant half of the result in a result vector, and returns the result vector.
private Vector64<short> MultiplyDoublingSaturateHighTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingSaturateHigh(left, right);
}
// left = <1000, 500, 13, 14>
// right = <500, 22, 23, 24>
// Result = <15, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyDoublingSaturateHigh(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyDoublingSaturateHigh(Vector128<short> left, Vector128<short> right)
Vector128<int> MultiplyDoublingSaturateHigh(Vector128<int> left, Vector128<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmulh v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
3. MultiplyDoublingSaturateHighScalar
Vector64<short> MultiplyDoublingSaturateHighScalar(Vector64<short> left, Vector64<short> right)
This method multiplies the values of corresponding elements of the left
and right
vectors, doubles the results, places the most significant half of the result in a result vector, and returns the result vector.
private Vector64<short> MultiplyDoublingSaturateHighScalarTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.Arm64.MultiplyDoublingSaturateHighScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <10210, 20020, 230, 240>
// Result = <3, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> MultiplyDoublingSaturateHighScalar(Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingSaturateHighScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmulh h16, h0, h1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
4. MultiplyDoublingScalarBySelectedScalarSaturateHigh
Vector64<short> MultiplyDoublingScalarBySelectedScalarSaturateHigh(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies vector elements in the left
vector by the rightIndex
vector element of theright
vector, doubles the results, places the most significant half of the truncated result in a result vector, and returns the result vector. All the other elements of result vector other than 0th element are set to 0.
private Vector64<short> MultiplyDoublingScalarBySelectedScalarSaturateHighTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyDoublingScalarBySelectedScalarSaturateHigh(left, right, 0);
}
// left = <11, 12, 13, 14>
// right = <10000, 22, 23, 24>
// rightIndex = 0
// Result = <3, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MultiplyDoublingScalarBySelectedScalarSaturateHigh(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyDoublingScalarBySelectedScalarSaturateHigh(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyDoublingScalarBySelectedScalarSaturateHigh(Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingScalarBySelectedScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmulh h16, h0, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
5. MultiplyDoublingWideningAndAddSaturateScalar
Vector64<int> MultiplyDoublingWideningAndAddSaturateScalar(Vector64<int> addend, Vector64<short> left, Vector64<short> right)
This method multiplies corresponding signed integer values in the left
and right
vectors, doubles the results, and accumulates the final results with the vector elements of the addend
vector. The result vector elements are twice as long as the elements that are multiplied. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningAndAddSaturateScalarTest(Vector64<int> addend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.Arm64.MultiplyDoublingWideningAndAddSaturateScalar(addend, left, right);
}
// addend = <11, 12>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <473, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> MultiplyDoublingWideningAndAddSaturateScalar(Vector64<long> addend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningAndAddSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal s0, h1, h2
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
6. MultiplyDoublingWideningAndSubtractSaturateScalar
Vector64<int> MultiplyDoublingWideningAndSubtractSaturateScalar(Vector64<int> minuend, Vector64<short> left, Vector64<short> right)
This method multiplies corresponding signed integer values in the left
and right
vectors, doubles the results, and subtracts the final results from the vector elements of the minuend
. The result vector elements are twice as long as the elements that are multiplied. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningAndSubtractSaturateScalarTest(Vector64<int> minuend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.Arm64.MultiplyDoublingWideningAndSubtractSaturateScalar(minuend, left, right);
}
// minuend = <11, 12>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <-451, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> MultiplyDoublingWideningAndSubtractSaturateScalar(Vector64<long> minuend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningAndSubtractSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl s0, h1, h2
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
7. MultiplyDoublingWideningLowerAndAddSaturate
Vector128<int> MultiplyDoublingWideningLowerAndAddSaturate(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
This method multiplies corresponding signed integer values in theleft
and right
vectors, doubles the results, and accumulates the final results with the vector elements of the addend
vector and return the accumulated result. The destination vector elements are twice as long as the elements that are multiplied. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerAndAddSaturateTest(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningLowerAndAddSaturate(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <473, 540, 611, 686>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningLowerAndAddSaturate(Vector128<long> addend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal v0.4s, v1.4h, v2.4h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
8. MultiplyDoublingWideningLowerAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningLowerAndSubtractSaturate(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
This method multiplies corresponding signed integer values in theleft
and right
vectors, doubles the results, and substracts the final results from the vector elements of the minuend
vector and return the result. The destination vector elements are twice as long as the elements that are multiplied. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerAndSubtractSaturateTest(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningLowerAndSubtractSaturate(minuend, left, right);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <-451, -516, -585, -658>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningLowerAndSubtractSaturate(Vector128<long> minuend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl v0.4s, v1.4h, v2.4h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
9. MultiplyDoublingWideningLowerByScalarAndAddSaturate
Vector128<int> MultiplyDoublingWideningLowerByScalarAndAddSaturate(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
This method multiplies each element in the left
vector by the 0th element of the right
vector, doubles the results, and accumulates the product with corresponding vector elements of the addend
vector and return the accumulated result. As seen in below example, the result vector element’s size int
is twice as long as that of input vector element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerByScalarAndAddSaturateTest(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningLowerByScalarAndAddSaturate(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <473, 516, 559, 602>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningLowerByScalarAndAddSaturate(Vector128<long> addend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerByScalarAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal v0.4s, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
10. MultiplyDoublingWideningLowerByScalarAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningLowerByScalarAndSubtractSaturate(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
This method multiplies each element in the left
vector by the 0th element of the right
vector, doubles the results, and subtracts the product with corresponding vector elements of the minuend
vector and return the result. As seen in below example, the result vector element’s size int
is twice as long as that of input vector element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerByScalarAndSubtractSaturateTest(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningLowerByScalarAndSubtractSaturate(minuend, left, right);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <-451, -492, -533, -574>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningLowerByScalarAndSubtractSaturate(Vector128<long> minuend, Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerByScalarAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl v0.4s, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
11. MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate
Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each element in the left
vector by the rightIndex
element of the right
vector, doubles the results, and accumulates the product with corresponding vector elements of the addend
vector and return the accumulated result. As seen in below example, the result vector element’s size int
is twice as long as that of input vector element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturateTest(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate(addend, left, right, 0);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 0
// Result = <473, 516, 559, 602>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate(Vector128<int> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate(Vector128<long> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturate(Vector128<long> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerBySelectedScalarAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal v0.4s, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
12. MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each element in the left
vector by the rightIndex
element of the right
vector, doubles the results, and subtracts the product with corresponding vector elements of the minuend
vector and return the result. As seen in below example, the result vector element’s size int
is twice as long as that of input vector element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturateTest(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate(minuend, left, right, 0);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 0
// Result = <-451, -492, -533, -574>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate(Vector128<int> minuend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate(Vector128<long> minuend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturate(Vector128<long> minuend, Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningLowerBySelectedScalarAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl v0.4s, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
13. MultiplyDoublingWideningSaturateLower
Vector128<int> MultiplyDoublingWideningSaturateLower(Vector64<short> left, Vector64<short> right)
This method multiplies corresponding vector elements in the left
and right
vectors, doubles the results, stores the result in a vector, and returns the result vector. If overflow occurs with any of the results, those results are saturated. As seen in below example, the result vector element’s int
size is twice as long as that of input vector element’s short
size.
private Vector128<int> MultiplyDoublingWideningSaturateLowerTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningSaturateLower(left, right);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <462, 528, 598, 672>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningSaturateLower(Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateLowerTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull v16.4s, v0.4h, v1.4h
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. MultiplyDoublingWideningSaturateLowerByScalar
Vector128<int> MultiplyDoublingWideningSaturateLowerByScalar(Vector64<short> left, Vector64<short> right)
This method multiplies each vector element in the left
vector by the 0th vector element of the right
vector, doubles the results, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated. As seen in below example, the result vector element’s int
size is twice as long as that of input vector element’s short
size.
private Vector128<int> MultiplyDoublingWideningSaturateLowerByScalarTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningSaturateLowerByScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <462, 504, 546, 588>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningSaturateLowerByScalar(Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateLowerByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull v16.4s, v0.4h, v1.h[0]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
15. MultiplyDoublingWideningSaturateLowerBySelectedScalar
Vector128<int> MultiplyDoublingWideningSaturateLowerBySelectedScalar(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
vector, doubles the results, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated. As seen in below example, the result vector element’s int
size is twice as long as that of input vector element’s short
size.
private Vector128<int> MultiplyDoublingWideningSaturateLowerBySelectedScalarTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningSaturateLowerBySelectedScalar(left, right, 2);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 2
// Result = <506, 552, 598, 644>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningSaturateLowerBySelectedScalar(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningSaturateLowerBySelectedScalar(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningSaturateLowerBySelectedScalar(Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateLowerBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull v16.4s, v0.4h, v1.h[2]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. MultiplyDoublingWideningSaturateScalar
Vector64<int> MultiplyDoublingWideningSaturateScalar(Vector64<short> left, Vector64<short> right)
This method multiplies corresponding vector elements in the left
and right
vector, doubles the results, stores the result in a vector, and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningSaturateScalarTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.Arm64.MultiplyDoublingWideningSaturateScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <462, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> MultiplyDoublingWideningSaturateScalar(Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull s16, h0, h1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
17. MultiplyDoublingWideningSaturateScalarBySelectedScalar
Vector64<int> MultiplyDoublingWideningSaturateScalarBySelectedScalar(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
, doubles the results, stores the result in a vector, and returns the result vector. All the values in this method are signed integer values. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningSaturateScalarBySelectedScalarTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyDoublingWideningSaturateScalarBySelectedScalar(left, right, 0);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 0
// Result = <462, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> MultiplyDoublingWideningSaturateScalarBySelectedScalar(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningSaturateScalarBySelectedScalar(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningSaturateScalarBySelectedScalar(Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull s16, h0, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. MultiplyDoublingWideningSaturateUpper
Vector128<int> MultiplyDoublingWideningSaturateUpper(Vector128<short> left, Vector128<short> right)
This method multiplies upper half of corresponding vector elements in the left
and right
vectors, doubles the results, stores the results in a vector, and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningSaturateUpperTest(Vector128<short> left, Vector128<short> right)
{
return AdvSimd.MultiplyDoublingWideningSaturateUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <750, 832, 918, 1008>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningSaturateUpper(Vector128<int> left, Vector128<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateUpperTest(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull2 v16.4s, v0.8h, v1.8h
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
19. MultiplyDoublingWideningSaturateUpperByScalar
Vector128<int> MultiplyDoublingWideningSaturateUpperByScalar(Vector128<short> left, Vector64<short> right)
This method multiplies upper half of each vector element in the left
vector by the 0th vector element of the right
vector, doubles the results, stores the results in a vector, and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningSaturateUpperByScalarTest(Vector128<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningSaturateUpperByScalar(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// Result = <330, 352, 374, 396>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningSaturateUpperByScalar(Vector128<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateUpperByScalarTest(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull2 v16.4s, v0.8h, v1.h[0]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
20. MultiplyDoublingWideningSaturateUpperBySelectedScalar
Vector128<int> MultiplyDoublingWideningSaturateUpperBySelectedScalar(Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies upper half of each vector element in the left
vector by the rightIndex
vector element of the right
vector, doubles the results, stores the results in a vector, and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningSaturateUpperBySelectedScalarTest(Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningSaturateUpperBySelectedScalar(left, right, 2);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 2
// Result = <390, 416, 442, 468>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningSaturateUpperBySelectedScalar(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningSaturateUpperBySelectedScalar(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningSaturateUpperBySelectedScalar(Vector128<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningSaturateUpperBySelectedScalarTest(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmull2 v16.4s, v0.8h, v1.h[2]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
21. MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate
Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate(Vector64<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
vector, doubles the results, and accumulates the results with the corresponding vector elements of the addend
vector and return the accumulated result. As seen in example below, the result vector element’s size int
is twice as long as that of input vector’s element’s short
size. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturateTest(Vector64<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate(addend, left, right, 0);
}
// addend = <11, 12>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 0
// Result = <473, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate(Vector64<int> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate(Vector64<long> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate(Vector64<long> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal s0, h1, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
22. MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate
Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate(Vector64<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
vector, doubles the results, and subtracts the results from the corresponding vector elements of the minuend
vector and return the result. As seen in example below, the result vector element’s size int
is twice as long as that of input vector’s element’s short
size. If overflow occurs with any of the results, those results are saturated.
private Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturateTest(Vector64<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate(minuend, left, right, 0);
}
// minuend = <11, 12>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 0
// Result = <-451, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate(Vector64<int> minuend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate(Vector64<long> minuend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<long> MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate(Vector64<long> minuend, Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl s0, h1, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
23. MultiplyDoublingWideningUpperAndAddSaturate
Vector128<int> MultiplyDoublingWideningUpperAndAddSaturate(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
This method multiplies corresponding elements in upper half of left
and right
vectors, doubles the results, and accumulates the results with the vector elements of the addend
vector. The result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperAndAddSaturateTest(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
{
return AdvSimd.MultiplyDoublingWideningUpperAndAddSaturate(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <761, 844, 931, 1022>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningUpperAndAddSaturate(Vector128<long> addend, Vector128<int> left, Vector128<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal2 v0.4s, v1.8h, v2.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
24. MultiplyDoublingWideningUpperAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningUpperAndSubtractSaturate(Vector128<int> minuend, Vector128<short> left, Vector128<short> right)
This method multiplies corresponding elements in upper half of left
and right
vectors, doubles the results, and subtracts the results with the vector elements of the minuend
vector. As seen in below example, the result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperAndSubtractSaturateTest(Vector128<int> minuend, Vector128<short> left, Vector128<short> right)
{
return AdvSimd.MultiplyDoublingWideningUpperAndSubtractSaturate(minuend, left, right);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <-739, -820, -905, -994>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningUpperAndSubtractSaturate(Vector128<long> minuend, Vector128<int> left, Vector128<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl2 v0.4s, v1.8h, v2.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
25. MultiplyDoublingWideningUpperByScalarAndAddSaturate
Vector128<int> MultiplyDoublingWideningUpperByScalarAndAddSaturate(Vector128<int> addend, Vector128<short> left, Vector64<short> right)
This method multiplies each vector element in the upper half of left
vector by the 0th vector element of the right
vector, doubles the results, and accumulates the final results with the vector elements of the addend
. As seen in below example, the result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperByScalarAndAddSaturateTest(Vector128<int> addend, Vector128<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningUpperByScalarAndAddSaturate(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// Result = <341, 364, 387, 410>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningUpperByScalarAndAddSaturate(Vector128<long> addend, Vector128<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperByScalarAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal2 v0.4s, v1.8h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
26. MultiplyDoublingWideningUpperByScalarAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningUpperByScalarAndSubtractSaturate(Vector128<int> minuend, Vector128<short> left, Vector64<short> right)
This method multiplies each vector element in the upper half of left
vector by the 0th vector element of the right
vector, doubles the results, and subtracts the product from the vector elements of the minuend
. As seen in below example, the result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperByScalarAndSubtractSaturateTest(Vector128<int> minuend, Vector128<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingWideningUpperByScalarAndSubtractSaturate(minuend, left, right);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// Result = <-319, -340, -361, -382>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> MultiplyDoublingWideningUpperByScalarAndSubtractSaturate(Vector128<long> minuend, Vector128<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperByScalarAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl2 v0.4s, v1.8h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
27. MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate
Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the upper half of left
vector by therightIndex
vector element of the right
vector, doubles the results, and accumulates the final results with the vector elements of the addend
. As seen in below example, the result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturateTest(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate(addend, left, right, 2);
}
// addend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 2
// Result = <401, 428, 455, 482>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate(Vector128<int> addend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate(Vector128<long> addend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturate(Vector128<long> addend, Vector128<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperBySelectedScalarAndAddSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlal2 v0.4s, v1.8h, v2.h[2]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
28. MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate
Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the upper half of left
vector by the rightIndex
vector element of the right
vector, doubles the results, and subtracts the product from the vector elements of the minuend
. As seen in below example, the result vector element’s size int
is twice as long as the input element’s size short
. If overflow occurs with any of the results, those results are saturated.
private Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturateTest(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate(minuend, left, right, 2);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 2
// Result = <-379, -404, -429, -454>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate(Vector128<int> minuend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate(Vector128<long> minuend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturate(Vector128<long> minuend, Vector128<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingWideningUpperBySelectedScalarAndSubtractSaturateTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmlsl2 v0.4s, v1.8h, v2.h[2]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
29. MultiplyExtended
Vector64<float> MultiplyExtended(Vector64<float> left, Vector64<float> right)
This method multiplies corresponding floating-point values in the left
and right
vectors, stores the result in a vector and returns the result vector. As per ARM docs, if one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.
private Vector64<float> MultiplyExtendedTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.MultiplyExtended(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <247.25, 281.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MultiplyExtended(Vector128<double> left, Vector128<double> right)
Vector128<float> MultiplyExtended(Vector128<float> left, Vector128<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyExtendedTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmulx v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
30. MultiplyExtendedByScalar
Vector128<double> MultiplyExtendedByScalar(Vector128<double> left, Vector64<double> right)
This method multiplies the floating-point values in the vector elements in the left
vector by the floating-point element in the right
vector, stores the result in a vector and returns the result vector. As per ARM docs, if one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.
private Vector128<double> MultiplyExtendedByScalarTest(Vector128<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.MultiplyExtendedByScalar(left, right);
}
// left = <11.5, 12.5>
// right = <11.5>
// Result = <132.25, 143.75>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyExtendedByScalarTest(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmulx v16.2d, v0.2d, v1.d[0]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
31. MultiplyExtendedBySelectedScalar
Vector64<float> MultiplyExtendedBySelectedScalar(Vector64<float> left, Vector64<float> right, byte rightIndex)
This method multiplies the floating-point values in the left
vector elements by the rightIndex
floating-point value in the right
vector, stores the result in a vector and returns the result vector. As per ARM docs, if one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.
private Vector64<float> MultiplyExtendedBySelectedScalarTest(Vector64<float> left, Vector64<float> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyExtendedBySelectedScalar(left, right, 0);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// rightIndex = 0
// Result = <247.25, 268.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MultiplyExtendedBySelectedScalar(Vector64<float> left, Vector128<float> right, byte rightIndex)
Vector128<double> MultiplyExtendedBySelectedScalar(Vector128<double> left, Vector128<double> right, byte rightIndex)
Vector128<float> MultiplyExtendedBySelectedScalar(Vector128<float> left, Vector64<float> right, byte rightIndex)
Vector128<float> MultiplyExtendedBySelectedScalar(Vector128<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyExtendedBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmulx v16.2s, v0.2s, v1.s[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. MultiplyExtendedScalar
Vector64<double> MultiplyExtendedScalar(Vector64<double> left, Vector64<double> right)
This method multiplies corresponding floating-point values in the left
and right
vectors, stores the resulting floating-point values in a vector, and returns the result vector. As per ARM docs, if one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.
private Vector64<double> MultiplyExtendedScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.MultiplyExtendedScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <132.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MultiplyExtendedScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyExtendedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmulx d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
33. MultiplyExtendedScalarBySelectedScalar
Vector64<double> MultiplyExtendedScalarBySelectedScalar(Vector64<double> left, Vector128<double> right, byte rightIndex)
This method multiplies corresponding floating-point values in the left
vector by the rightIndex
floating-point value in the right
vector, stores the results in a vector, and returns the result vector. As per ARM docsm if one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.
private Vector64<double> MultiplyExtendedScalarBySelectedScalarTest(Vector64<double> left, Vector128<double> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyExtendedScalarBySelectedScalar(left, right, 0);
}
// left = <11.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <132.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MultiplyExtendedScalarBySelectedScalar(Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> MultiplyExtendedScalarBySelectedScalar(Vector64<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyExtendedScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmulx d16, d0, v1.d[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
34. MultiplyRoundedDoublingByScalarSaturateHigh
Vector64<short> MultiplyRoundedDoublingByScalarSaturateHigh(Vector64<short> left, Vector64<short> right)
This method multiplies each vector element in the left
by the 0th vector element of the right
, doubles the results, stores the most significant half of the final results into a vector, and returns the result vector. The results are rounded.
private Vector64<short> MultiplyRoundedDoublingByScalarSaturateHighTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyRoundedDoublingByScalarSaturateHigh(left, right);
}
// left = <1000, 2000, 3000, 4000>
// right = <30, 40, 50, 60>
// Result = <1, 2, 3, 4>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyRoundedDoublingByScalarSaturateHigh(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyRoundedDoublingByScalarSaturateHigh(Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyRoundedDoublingByScalarSaturateHigh(Vector128<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyRoundedDoublingByScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrdmulh v16.4h, v0.4h, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
35. MultiplyRoundedDoublingBySelectedScalarSaturateHigh
Vector64<short> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
by the rightIndex
vector element of the right
, doubles the results, stores the most significant half of the final results into a vector, and returns the result vector. The results are rounded.
private Vector64<short> MultiplyRoundedDoublingBySelectedScalarSaturateHighTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyRoundedDoublingBySelectedScalarSaturateHigh(left, right, 2);
}
// left = <1000, 2000, 3000, 4000>
// right = <30, 40, 50, 60>
// rightIndex = 2
// Result = <2, 3, 5, 6>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<short> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyRoundedDoublingBySelectedScalarSaturateHigh(Vector128<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyRoundedDoublingBySelectedScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrdmulh v16.4h, v0.4h, v1.h[2]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
36. MultiplyRoundedDoublingSaturateHigh
Vector64<short> MultiplyRoundedDoublingSaturateHigh(Vector64<short> left, Vector64<short> right)
This method multiplies corresponding elements of theleft
and right
vectors, doubles the results, stores the most significant half of the results in a vector, and returns the result vector.
private Vector64<short> MultiplyRoundedDoublingSaturateHighTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyRoundedDoublingSaturateHigh(left, right);
}
// left = <1000, 2000, 3000, 4000>
// right = <30, 40, 50, 60>
// Result = <1, 2, 5, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyRoundedDoublingSaturateHigh(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyRoundedDoublingSaturateHigh(Vector128<short> left, Vector128<short> right)
Vector128<int> MultiplyRoundedDoublingSaturateHigh(Vector128<int> left, Vector128<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyRoundedDoublingSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrdmulh v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
37. MultiplyRoundedDoublingSaturateHighScalar
Vector64<short> MultiplyRoundedDoublingSaturateHighScalar(Vector64<short> left, Vector64<short> right)
This method multiplies the values of corresponding elements of the left
and right
vectors, doubles the results, places the most significant half of the result in a result vector at 0th index. Other vector elements are set to 0.
private Vector64<short> MultiplyRoundedDoublingSaturateHighScalarTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.Arm64.MultiplyRoundedDoublingSaturateHighScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <10210, 20020, 230, 240>
// Result = <3, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> MultiplyRoundedDoublingSaturateHighScalar(Vector64<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyRoundedDoublingSaturateHighScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrdmulh h16, h0, h1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
38. MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh
Vector64<short> MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies vector elements in the left
vector by the rightIndex
vector element of the right
vector, doubles the results, stores the most significant half of the result in a vector, and returns the result vector. If any of the results overflows, they are saturated. The results are rounded.
private Vector64<short> MultiplyRoundedDoublingScalarBySelectedScalarSaturateHighTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.Arm64.MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh(left, right, 0);
}
// left = <11, 12, 13, 14>
// right = <10000, 22, 23, 24>
// rightIndex = 0
// Result = <3, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh(Vector64<int> left, Vector128<int> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyRoundedDoublingScalarBySelectedScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrdmulh h16, h0, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
39. MultiplyScalar
Vector64<double> MultiplyScalar(Vector64<double> left, Vector64<double> right)
This method multiplies the floating-point values of theleft
and right
vectors, and returns the result.
private Vector64<double> MultiplyScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.MultiplyScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <132.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MultiplyScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmul d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
40. MultiplyScalarBySelectedScalar
Vector64<float> MultiplyScalarBySelectedScalar(Vector64<float> left, Vector64<float> right, byte rightIndex)
This method multiplies the vector elements in the left
vector by the element at rightIndex
in the right
vector, stores the results in a vector, and returns the result vector. All the values in this method are floating-point values.
private Vector64<float> MultiplyScalarBySelectedScalarTest(Vector64<float> left, Vector64<float> right, byte rightIndex)
{
return AdvSimd.MultiplyScalarBySelectedScalar(left, right, 0);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// rightIndex = 0
// Result = <247.25, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MultiplyScalarBySelectedScalar(Vector64<float> left, Vector128<float> right, byte rightIndex)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MultiplyScalarBySelectedScalar(Vector64<double> left, Vector128<double> right, byte rightIndex)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmul s16, s0, v1.s[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
41. MultiplySubtract
Vector64<byte> MultiplySubtract(Vector64<byte> minuend, Vector64<byte> left, Vector64<byte> right)
This method multiplies corresponding elements in the vectors of the left
and right
vectors, and subtracts the results from the vector elements of the minuend
vector and returns the result.
private Vector64<byte> MultiplySubtractTest(Vector64<byte> minuend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MultiplySubtract(minuend, left, right);
}
// minuend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <21, 22, 23, 24, 25, 26, 27, 28>
// right = <31, 32, 33, 34, 35, 36, 37, 38>
// Result = <128, 76, 22, 222, 164, 104, 42, 234>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplySubtract(Vector64<short> minuend, Vector64<short> left, Vector64<short> right)
Vector64<int> MultiplySubtract(Vector64<int> minuend, Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MultiplySubtract(Vector64<sbyte> minuend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> MultiplySubtract(Vector64<ushort> minuend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplySubtract(Vector64<uint> minuend, Vector64<uint> left, Vector64<uint> right)
Vector128<byte> MultiplySubtract(Vector128<byte> minuend, Vector128<byte> left, Vector128<byte> right)
Vector128<short> MultiplySubtract(Vector128<short> minuend, Vector128<short> left, Vector128<short> right)
Vector128<int> MultiplySubtract(Vector128<int> minuend, Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MultiplySubtract(Vector128<sbyte> minuend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> MultiplySubtract(Vector128<ushort> minuend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MultiplySubtract(Vector128<uint> minuend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplySubtractTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mls v0.8b, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
42. MultiplySubtractByScalar
Vector64<short> MultiplySubtractByScalar(Vector64<short> minuend, Vector64<short> left, Vector64<short> right)
This method multiplies the vector elements in the left
vector by the 0th element value in the right
vector, and subtracts the results from the vector elements of the minuend
and returns the result.
private Vector64<short> MultiplySubtractByScalarTest(Vector64<short> minuend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplySubtractByScalar(minuend, left, right);
}
// minuend = <11, 12, 13, 14>
// left = <21, 22, 23, 24>
// right = <31, 32, 33, 34>
// Result = <-640, -670, -700, -730>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplySubtractByScalar(Vector64<int> minuend, Vector64<int> left, Vector64<int> right)
Vector64<ushort> MultiplySubtractByScalar(Vector64<ushort> minuend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplySubtractByScalar(Vector64<uint> minuend, Vector64<uint> left, Vector64<uint> right)
Vector128<short> MultiplySubtractByScalar(Vector128<short> minuend, Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplySubtractByScalar(Vector128<int> minuend, Vector128<int> left, Vector64<int> right)
Vector128<ushort> MultiplySubtractByScalar(Vector128<ushort> minuend, Vector128<ushort> left, Vector64<ushort> right)
Vector128<uint> MultiplySubtractByScalar(Vector128<uint> minuend, Vector128<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplySubtractByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mls v0.4h, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
43. MultiplySubtractBySelectedScalar
Vector64<short> MultiplySubtractBySelectedScalar(Vector64<short> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies the vector elements in the left
vector by the rightIndex
element value in the right
vector, and subtracts the results from the vector elements of the minuend
and returns the result.
private Vector64<short> MultiplySubtractBySelectedScalarTest(Vector64<short> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplySubtractBySelectedScalar(minuend, left, right, 2);
}
// minuend = <11, 12, 13, 14>
// left = <21, 22, 23, 24>
// right = <31, 32, 33, 34>
// rightIndex = 2
// Result = <-682, -714, -746, -778>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplySubtractBySelectedScalar(Vector64<short> minuend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplySubtractBySelectedScalar(Vector64<int> minuend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplySubtractBySelectedScalar(Vector64<int> minuend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector64<ushort> MultiplySubtractBySelectedScalar(Vector64<ushort> minuend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector64<ushort> MultiplySubtractBySelectedScalar(Vector64<ushort> minuend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector64<uint> MultiplySubtractBySelectedScalar(Vector64<uint> minuend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector64<uint> MultiplySubtractBySelectedScalar(Vector64<uint> minuend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)
Vector128<short> MultiplySubtractBySelectedScalar(Vector128<short> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplySubtractBySelectedScalar(Vector128<short> minuend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplySubtractBySelectedScalar(Vector128<int> minuend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplySubtractBySelectedScalar(Vector128<int> minuend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<ushort> MultiplySubtractBySelectedScalar(Vector128<ushort> minuend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<ushort> MultiplySubtractBySelectedScalar(Vector128<ushort> minuend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<uint> MultiplySubtractBySelectedScalar(Vector128<uint> minuend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<uint> MultiplySubtractBySelectedScalar(Vector128<uint> minuend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplySubtractBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mls v0.4h, v1.4h, v2.h[2]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
44. MultiplyWideningLower
Vector128<ushort> MultiplyWideningLower(Vector64<byte> left, Vector64<byte> right)
This method multiplies corresponding vector elements in the left
and right
vector, stores the result in a vector, and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MultiplyWideningLower(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <231, 264, 299, 336, 375, 416, 459, 504>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningLower(Vector64<short> left, Vector64<short> right)
Vector128<long> MultiplyWideningLower(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> MultiplyWideningLower(Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> MultiplyWideningLower(Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umull v16.8h, v0.8b, v1.8b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
45. MultiplyWideningLowerAndAdd
Vector128<ushort> MultiplyWideningLowerAndAdd(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
This method multiplies the vector elements in the left
by the corresponding vector elements of the right
vector, and accumulates the results with the vector elements of the addend
vector and return the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningLowerAndAddTest(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MultiplyWideningLowerAndAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <242, 276, 312, 350, 390, 432, 476, 522>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
Vector128<long> MultiplyWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyWideningLowerAndAdd(Vector128<short> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> MultiplyWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> MultiplyWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningLowerAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umlal v0.8h, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8